Mining Very Large Datasets with Svm and Visualization
نویسنده
چکیده
We present a new support vector machine (SVM) algorithm and graphical methods for mining very large datasets. We develop the active selection of training data points that can significantly reduce the training set in the SVM classification. We summarize the massive datasets into interval data. We adapt the RBF kernel used by the SVM algorithm to deal with this interval data. We only keep the data points corresponding to support vectors and the representative data points of non support vectors. Thus the SVM algorithm uses this subset to construct the non-linear model. We also use interactive graphical methods for trying to explain the SVM results. The graphical representation of IF-THEN rules extracted from the SVM models can be easily interpreted by humans. The user deeply understands the SVM models’ behaviour towards data. The numerical test results are obtained on real and artificial datasets.
منابع مشابه
Incremental SVM and Visualization Tools for Bio-medical Data Mining
Most of the bio-data analysis problems process datasets with a very large number of attributes and few training data. This situation is usually suited for support vector machine (SVM) approaches. We have implemented a new column-incremental linear proximal SVM to deal with this problem. Without any feature selection step, the algorithm can deal with very large datasets (at least 10 attributes) ...
متن کاملMining Very Large Datasets with SVM and Visualization
We present a new support vector machine (SVM) algorithm and graphical methods for mining very large datasets. We develop the active selection of training data points that can significantly reduce the training set in the SVM classification. We summarize the massive datasets into interval data. We adapt the RBF kernel used by the SVM algorithm to deal with this interval data. We only keep the dat...
متن کاملEnhancing SVM with Visualization
Understanding the result produced by a data-mining algorithm is as important as the accuracy. Unfortunately, support vector machine (SVM) algorithms provide only the support vectors used as black box to efficiently classify the data with a good accuracy. This paper presents a cooperative approach using SVM algorithms and visualization methods to gain insight into a model construction task wit...
متن کاملTowards High Dimensional Data Mining with Boosting of PSVM and Visualization Tools
We present a new supervised classification algorithm using boosting with support vector machines (SVM) and able to deal with very large data sets. Training a SVM usually needs a quadratic programming, so that the learning task for large data sets requires large memory capacity and a long time. Proximal SVM proposed by Fung and Mangasarian is another SVM formulation very fast to train because it...
متن کاملA Simple, Fast Support Vector Machine Algorithm for Data Mining
Support Vector Machines (SVM) and kernel related methods have shown to build accurate models but the learning task usually needs a quadratic programming, so that the learning task for large datasets requires big memory capacity and a long time. A new incremental, parallel and distributed SVM algorithm using linear or non linear kernels proposed in this paper aims at classifying very large datas...
متن کامل